1 Introduction

This R Markdown script contains all the code used for outlier detection, data analysis and plotting; including additional statistical analyses and all the statistical models with summaries.

2 Experiment 1

2.1 Participant exclusions

# Loading the data to filter out participants
dataExp1 <- read.delim("./dataAnonNotFilteredExp1_added.txt", sep ="\t", header = TRUE)

# Remove 11 suspicious participants
rmParticipant1Exp1 <- c("5970a90e","6e5ccd20","9e75832a","86687167","64afd2cc","429604bc","409f2599","d6390ddc","7ab6a320","bd90455b","39b29c2b")
dataExp1 <- dataExp1[!dataExp1$workerId %in% rmParticipant1Exp1,]

# Calculate dprime
# Get the hit rate: 4 or 5 to real words
dataExp1$accReal <- ifelse(dataExp1$type=="real" & dataExp1$enteredResponse %in% c(4,5),1,0)
hrateT <- aggregate(accReal ~ workerId, sum, data=dataExp1)
hrateT$hit_rate <- round(hrateT$accReal/132,3)

# Manually correct the hit rate of one participant (who did one less item)
#length(which(dataExp1$type=="real" & dataExp1$workerId=="c0999383"))
hrateT[hrateT$workerId=="c0999383",]$hit_rate <- round(hrateT[hrateT$workerId=="c0999383",]$accReal/131,3)

# Get the FA (false alarm) rate: 4-5 to nonwords
dataExp1$accNon <- ifelse(dataExp1$type=="non" & dataExp1$enteredResponse %in% c(4,5),1,0)
farateT <- aggregate(accNon ~ workerId, sum, data=dataExp1)
farateT$fa_rate <- round(farateT$accNon/209,3)

# Manually correct the FA rate of one participant (who did two less items)
#length(which(dataExp1$type=="non" & dataExp1$workerId=="f06d0f73"))
farateT[farateT$workerId=="f06d0f73",]$fa_rate <- round(farateT[farateT$workerId=="f06d0f73",]$accNon/207,3)
dprime <- merge(hrateT, farateT, by="workerId")
dprime$dprime <- round(qnorm(dprime$hit_rate) - qnorm(dprime$fa_rate),3)
dprime <- dprime[c("workerId","dprime")]
drop <- c("accReal","accNon")
dataExp1 <- dataExp1[,!(names(dataExp1) %in% drop)]
dataExp1 <- merge(dataExp1, dprime, by="workerId")
rm(dprime, hrateT, farateT)

# Remove 4 participants whose d-prime value is lower than 0
rmParticipant2Exp1 <- unique(dataExp1[dataExp1$dprime < 0,]$workerId) 
# a29fe31d e1f518e6 efdd3439 f57a3633 
dataExp1 <- dataExp1[!dataExp1$workerId %in% rmParticipant2Exp1,]           

# Remove one native speaker of Mandarin Chinese
dataExp1 <- dataExp1[!dataExp1$firstLang=="Mandarin",]

# Remove 11 participants whose speakMaori or compMaori is equal to or above 3
rmParticipant3Exp1 <- unique(dataExp1[dataExp1$speakMaori >= 3 | dataExp1$compMaori >= 3,]$workerId)
# 007a1752 170ce007 20c10896 3128bb29 66d0a920 75caca3b b00cd565 d3cd7085 de95cdaf eef4d9c0 fab8a51f
dataExp1 <- dataExp1[!dataExp1$workerId %in% rmParticipant3Exp1,]

# Remove one participant who did not learn their English in NZ and have been living overseas for more than two years.
summaryExp1WorkerId <- unique(dataExp1[,c("workerId","firstLangCountry","place","duration")])
EngNotInNZExp1 <- summaryExp1WorkerId[!summaryExp1WorkerId$firstLangCountry=="NZ",]
rmParticipant4Exp1 <- unique(EngNotInNZExp1[EngNotInNZExp1$place=="overseas",]$workerId) 
# 880242c2
dataExp1 <- dataExp1[!dataExp1$workerId %in% rmParticipant4Exp1,]

# Detect participant whose median reactionTime is shorter than 2*SD below the mean of all participants
median_RT <- aggregate(dataExp1$reactionTime, by=list(dataExp1$workerId), median)
names(median_RT) <- c("workerId","median")
cut <- mean(median_RT$median)-2*sd(median_RT$median)
# median_RT[!median_RT$median > cut,]$workerId # None detected!

# Check the total number of usable participants for Exp1
#length(unique(dataExp1$workerId)) # 101

2.2 Dataset structure

The data is structured as follows:

  • workerId is the unique ID for each participant.
  • enteredResponse is the wellformedness rating for each stimulus.
  • reactionTime is the reaction time for each rating (seconds).
  • type is the classification of each stimulus: nonword (‘non’) or word (‘real’).
  • length is the phoneme length of each stimulus.
  • word is the stimulus used for the rating.
  • speakMaori is each participant’s report of how well they can speak Māori (on a scale from 0 to 5).
  • compMaori is each participant’s report of how well they can understand/read Māori (on a scale from 0 to 5).
  • maoriProf is the sum of quantified response for speakMaori and compMaori (participant Māori proficiency).
  • age is the age group for each participant.
  • gender is the gender reported by each participant.
  • ethnicity is categorized into binary answers, either Māori (M) or non Māori (non M).
  • education is each participant’s highest level of education.
  • children is each participant’s report of whether they have had any children who have attended preschool or primary school in New Zealand in the past five years.
  • maoriList is each participant’s basic knowledge of Māori (with a scale ranging from 0 to 9).
  • place is each participant’s current place of living (3 levels: NZ North Island, NZ South Island, or Overseas).
  • duration is each participant’s time living in their current place (2 levels: long is > 2 years; short is =< 2 years).
  • firstLang is each participant’s first language.
  • firstLangCountry is the country where each participant learned their first language.
  • anyOtherLangs is any other languages each participant reports speaking.
  • hawaii is the binary response to the question whether participants have lived in Hawaii.
  • anyPolynesian is the binary response to the question whether participants know any Polynesian such as Hawaiian, Tahitian, Sāmoan, or Tongan.
  • whichPolynesian is the information regarding participants’ knowledge of Polynesian languages, if they know any.
  • impairments is the answer to the question whether participants have a history of any speech or language impairments.
  • maoriExpo is each participant’s level of exposure to Māori (with a scale ranging from 0 to 10).
  • scoreDictType is the type-based dictionary phonotactic score normalized by the phonemic length of each stimulus.
  • scoreDictToken is the frequency-weighted dictionary phonotactic score normalized by the phonemic length of each stimulus.
  • scoreRsSeg is the segmented running speech phonotactic score normalized by the phonemic length of each stimulus.
  • n.neighbors is the the number of words (from the Māori dictionary) that can be reached by adding, deleting, or substituting one phoneme in each stimulus.
  • mean.neighbor.logfreq is the frequency-weighted phonological neighbourhood density.
  • dprime is the measure of sensitivity for each participant’s performance.

2.3 Figure 1: Overview of participants’ sociolinguistic profile in Experiment 1

Figure 1: Overview of participants' sociolinguistic profile in Experiment 1. Bars are labeled with their counts for each category.

Figure 1: Overview of participants’ sociolinguistic profile in Experiment 1. Bars are labeled with their counts for each category.

2.4 Figure 3: Length distribution of real word stimuli

Figure 3: Length distribution of real word stimuli. The length of stimulus (the number of phonemes) is represented on the x-axis and the number of stimuli is represented on the y-axis.

Figure 3: Length distribution of real word stimuli. The length of stimulus (the number of phonemes) is represented on the x-axis and the number of stimuli is represented on the y-axis.

2.5 Figure 4: Average rating per word

Figure 4: Average word ratings by phonotactic score. Overlapping labels are not plotted.

Figure 4: Average word ratings by phonotactic score. Overlapping labels are not plotted.

Figure 4: Average word ratings by phonotactic score. Overlapping labels are not plotted.

Figure 4: Average word ratings by phonotactic score. Overlapping labels are not plotted.

Figure 4: Average word ratings by phonotactic score. Overlapping labels are not plotted.

Figure 4: Average word ratings by phonotactic score. Overlapping labels are not plotted.

2.6 Table1: Modeling confidence ratings with an ordinal mixed effects model

# Model for Table 1
dataExp1DprimeFinite <- dataExp1[is.finite(dataExp1$dprime),]
dataExp1DprimeFinite$macron <- FALSE
dataExp1DprimeFinite[grepl("ā|ē|ī|ō|ū",dataExp1DprimeFinite$word),]$macron <- TRUE
dataExp1DprimeFinite$macron <- as.factor(dataExp1DprimeFinite$macron)
dataExp1DprimeFinite$enteredResponse <- as.factor(dataExp1DprimeFinite$enteredResponse)
# modelTable1 <- clmm(enteredResponse ~ macron*type*c.(dprime) + c.(n.neighbors)*type*c.(dprime) + c.(length)*c.(dprime) + c.(scoreDictToken) + (1 + macron*type + c.(n.neighbors)*type| workerId) + (1 + c.(length) + c.(scoreDictToken)| workerId) + (0 + c.(dprime)|word), data=dataExp1DprimeFinite)
# saveRDS(modelTable1, file = "modelTable1.rds")
modelTable1 <- readRDS("./modelTable1.rds")  
clm_table(modelTable1, caption="Table 1: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.")
Table 1: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.
Parameter Estimate Std. Error \(z\) \(p\)
Effects macron = TRUE 1.043 0.073 14.291 <0.001 ***
type = real 3.627 0.106 34.254 <0.001 ***
dprime (centered) -0.215 0.175 -1.228 0.219
n.neighbors (centered) 0.034 0.004 8.364 <0.001 ***
length (centered) 0.045 0.011 3.953 <0.001 ***
scoreDictToken (centered) 0.977 0.060 16.372 <0.001 ***
macron = TRUE × type = real -0.478 0.086 -5.545 <0.001 ***
macron = TRUE × dprime (centered) -0.628 0.132 -4.741 <0.001 ***
type = real × dprime (centered) 1.489 0.173 8.610 <0.001 ***
type = real × n.neighbors (centered) -0.018 0.005 -3.664 <0.001 ***
dprime (centered) × n.neighbors (centered) 0.018 0.010 1.776 0.076 .
dprime (centered) × length (centered) -0.064 0.024 -2.635 0.008 **
macron = TRUE × type = real × dprime (centered) 0.408 0.177 2.304 0.021 *
type = real × dprime (centered) × n.neighbors (centered) -0.024 0.011 -2.191 0.028 *
Thresholds 1|2 -2.800 0.115
2|3 -0.611 0.113
3|4 1.396 0.113
4|5 2.668 0.114

2.7 Figure 5: Effect plots from the model presented in Table 1

Figure 5: Effect plots of the interaction between the presence of macrons and the distinction between non vs. real word stimuli on the top (Fig.5a), the interaction between the neighbourhood density and the distinction between non vs. real word stimuli on the middle (Fig.5b), the interaction between d′ and the presence of macrons (Fig.5c), and the frequency-weighted dictionary phonotactic score on the bottom (Fig.5d). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

Figure 5: Effect plots of the interaction between the presence of macrons and the distinction between non vs. real word stimuli on the top (Fig.5a), the interaction between the neighbourhood density and the distinction between non vs. real word stimuli on the middle (Fig.5b), the interaction between d′ and the presence of macrons (Fig.5c), and the frequency-weighted dictionary phonotactic score on the bottom (Fig.5d). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

2.8 TableS1: Modeling confidence ratings with an ordinal mixed effects model with a subset of data

# Model for Table S1
dataExp1DScore <- dataExp1DprimeFinite[dataExp1DprimeFinite$scoreDictToken > min(dataExp1DprimeFinite[dataExp1DprimeFinite$type=="real",]$scoreDictToken),]
# modelTableS1 <- clmm(enteredResponse ~ macron*type*c.(dprime) + c.(n.neighbors)*type*c.(dprime) + c.(length)*c.(dprime) + c.(scoreDictToken) + (1 + macron*type + c.(n.neighbors)*type| workerId) + (1 + c.(length) + c.(scoreDictToken)| workerId) + (0 + c.(dprime)|word), data=dataExp1DScore)
# saveRDS(modelTableS1, file = "modelTableS1.rds")
modelTableS1 <- readRDS("./modelTableS1.rds")  
clm_table(modelTableS1, caption="Table S1: Model summary of confidence ratings with an ordinal mixed effects model with a subset of data after discarding 50 nonwords based on token-based dictionary phonotasctic scores. All numeric variables in this model are centered.")
Table S1: Model summary of confidence ratings with an ordinal mixed effects model with a subset of data after discarding 50 nonwords based on token-based dictionary phonotasctic scores. All numeric variables in this model are centered.
Parameter Estimate Std. Error \(z\) \(p\)
Effects macron = TRUE 0.969 0.077 12.566 <0.001 ***
type = real 3.457 0.119 29.124 <0.001 ***
dprime (centered) -0.219 0.194 -1.124 0.261
n.neighbors (centered) 0.026 0.005 5.675 <0.001 ***
length (centered) 0.033 0.013 2.605 0.009 **
scoreDictToken (centered) 1.319 0.079 16.780 <0.001 ***
macron = TRUE × type = real -0.540 0.093 -5.788 <0.001 ***
macron = TRUE × dprime (centered) -0.678 0.146 -4.644 <0.001 ***
type = real × dprime (centered) 1.530 0.196 7.811 <0.001 ***
type = real × n.neighbors (centered) -0.015 0.005 -2.900 0.004 **
dprime (centered) × n.neighbors (centered) 0.019 0.011 1.702 0.089 .
dprime (centered) × length (centered) -0.057 0.027 -2.135 0.033 *
macron = TRUE × type = real × dprime (centered) 0.384 0.193 1.995 0.046 *
type = real × dprime (centered) × n.neighbors (centered) -0.024 0.012 -2.091 0.037 *
Thresholds 1|2 -2.842 0.124
2|3 -0.725 0.121
3|4 1.240 0.121
4|5 2.520 0.122

2.9 Statistical models with self-reported survey measures

2.9.1 Model A: Participants’ self-evaluated level of Māori proficiency

dataExp1$macron <- FALSE
dataExp1[grepl("ā|ē|ī|ō|ū",dataExp1$word),]$macron <- TRUE
dataExp1$macron <- as.factor(dataExp1$macron)
dataExp1$enteredResponse <- as.factor(dataExp1$enteredResponse)
# modelA <- clmm(enteredResponse ~ c.(scoreDictToken) + c.(n.neighbors) + macron + c.(length)*c.(maoriProf) + type*c.(maoriProf) + (1 + c.(scoreDictToken) + c.(n.neighbors) + macron | workerId) + (1 + c.(length) + type | workerId) + (0 + c.(maoriProf) | word), data=dataExp1)
# saveRDS(modelA, file = "modelA.rds")
modelA <- readRDS("./modelA.rds")  
clm_table(modelA, caption="Table S2: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.")
Table S2: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.
Parameter Estimate Std. Error \(z\) \(p\)
Effects scoreDictToken (centered) 0.989 0.062 16.041 <0.001 ***
n.neighbors (centered) 0.029 0.003 10.170 <0.001 ***
macron = TRUE 0.950 0.076 12.494 <0.001 ***
length (centered) 0.042 0.012 3.499 <0.001 ***
maoriProf (centered) -0.062 0.126 -0.488 0.626
type = real 3.581 0.148 24.277 <0.001 ***
length (centered) × maoriProf (centered) -0.031 0.014 -2.242 0.025 *
maoriProf (centered) × type = real 0.520 0.180 2.895 0.004 **
Thresholds 1|2 -2.769 0.116
2|3 -0.642 0.113
3|4 1.396 0.114
4|5 2.662 0.114

2.9.2 Figure S1: Effect plots from Model A (Participants’ self-evaluated level of Māori proficiency)

Figure S1: Effect plots of phonotactic score (Fig.S1a) and neighbourhood density (Fig.S1b). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

Figure S1: Effect plots of phonotactic score (Fig.S1a) and neighbourhood density (Fig.S1b). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

2.9.3 Model B: Participants’ self-reported level of exposure to Māori

# modelB <- clmm(enteredResponse ~ c.(scoreDictToken) + c.(n.neighbors) + macron  + type +  c.(length) + c.(maoriExpo) + (1 + c.(scoreDictType) + c.(n.neighbors) + macron | workerId) + (1 + type + c.(length) | workerId) + (0 + c.(maoriExpo) |word), data=dataExp1)
# saveRDS(modelB, file = "modelB.rds")
modelB <- readRDS("./modelB.rds")   
clm_table(modelB, caption="Table S3: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.")
Table S3: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.
Parameter Estimate Std. Error \(z\) \(p\)
Effects scoreDictToken (centered) 0.937 0.051 18.427 <0.001 ***
n.neighbors (centered) 0.029 0.003 9.753 <0.001 ***
macron = TRUE 0.838 0.075 11.186 <0.001 ***
type = real 3.566 0.152 23.435 <0.001 ***
length (centered) 0.039 0.012 3.281 0.001 **
maoriExpo (centered) 0.143 0.041 3.510 <0.001 ***
Thresholds 1|2 -2.747 0.122
2|3 -0.624 0.119
3|4 1.412 0.120
4|5 2.677 0.121

2.9.4 Figure S2: Effect plots from Model B (Participants’ self-reported level of exposure to Māori)

Figure S2: Effect plots of phonotactic score (Fig.S1a) and neighbourhood density (Fig.S1b). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

Figure S2: Effect plots of phonotactic score (Fig.S1a) and neighbourhood density (Fig.S1b). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

2.9.5 Model C: Participants’ basic knowledge of Māori

# modelC <- clmm(enteredResponse ~  c.(n.neighbors)*c.(scoreDictToken)*c.(maoriList) + type*c.(maoriList) + c.(length) + macron  +  (1 + c.(n.neighbors)*c.(scoreDictToken) + type | workerId) + (1 + c.(length) + macron | workerId) + (0 + c.(maoriList) | word), data=dataExp1)
# saveRDS(modelC, file = "modelC.rds")
modelC <- readRDS("./modelC.rds")   
clm_table(modelC, caption="Table S4: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.")
Table S4: Model summary of confidence ratings with an ordinal mixed effects model. All numeric variables in this model are centered.
Parameter Estimate Std. Error \(z\) \(p\)
Effects n.neighbors (centered) 0.027 0.003 8.527 <0.001 ***
scoreDictToken (centered) 0.980 0.059 16.573 <0.001 ***
maoriList (centered) 0.065 0.045 1.423 0.155
type = real 3.605 0.133 27.030 <0.001 ***
length (centered) 0.043 0.012 3.648 <0.001 ***
macron = TRUE 0.949 0.076 12.442 <0.001 ***
n.neighbors (centered) × scoreDictToken (centered) 0.009 0.007 1.367 0.172
n.neighbors (centered) × maoriList (centered) 0.004 0.002 2.738 0.006 **
scoreDictToken (centered) × maoriList (centered) 0.013 0.034 0.386 0.700
maoriList (centered) × type = real 0.276 0.059 4.697 <0.001 ***
n.neighbors (centered) × scoreDictToken (centered) × maoriList (centered) -0.014 0.004 -3.593 <0.001 ***
Thresholds 1|2 -2.784 0.110
2|3 -0.648 0.107
3|4 1.405 0.108
4|5 2.685 0.109

2.9.6 Figure S3: Effect plots from Model C (Participants’ basic knowledge of Māori)

Figure S3: Effect plots of phonotactic score (Fig.S1a) and neighbourhood density (Fig.S1b). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

Figure S3: Effect plots of phonotactic score (Fig.S1a) and neighbourhood density (Fig.S1b). Plots on the left show predicted mean ratings and plots on the right show predicted distributions over ratings.

3 Experiment 2

3.1 Participant exclusions

# Loading the data to filter out participants
dataExp2 <- read.delim("./dataAnonNotFilteredExp2_added.txt", sep ="\t", header = TRUE)

# Remove 9 participants whose speakMaori or compMaori is equal to or above 3
rmParticipant1Exp2 <- unique(dataExp2[dataExp2$speakMaori >= 3 | dataExp2$compMaori >= 3,]$workerId) 
# 1b817b93 f44d2e0f 2d528dd7 90e5e8cd aa592fef a95bbe09 35ace421 b3f52c92 e7a6cb02
dataExp2 <- dataExp2[!dataExp2$workerId %in% rmParticipant1Exp2,]

# Remove one participant who did not learn their English in NZ and have been living overseas for more than two years (duration == "long")
summaryExp2WorkerId <- unique(dataExp2[,c("workerId","firstLangCountry","place","duration")])
EngNotInNZExp2 <- summaryExp2WorkerId[!summaryExp2WorkerId$firstLangCountry=="NZ",]
rmParticipant2Exp2 <- unique(EngNotInNZExp2[EngNotInNZExp2$place=="overseas",]$workerId) 
# 1e48b18a
dataExp2 <- dataExp2[!dataExp2$workerId %in% rmParticipant2Exp2,]

# Detect participant whose median reactionTime is shorter than 2*SD below the mean of all participants
median_RT <- aggregate(dataExp2$reactionTime, by=list(dataExp2$workerId), median)
names(median_RT) <- c("workerId","median")
cut <- mean(median_RT$median)-2*sd(median_RT$median)
# median_RT[!median_RT$median > cut,]$workerId # None detected!

# Remove a participant with joke answers
dataExp2 <- dataExp2[!dataExp2$workerId=="eaed6b4d",]

# Check the total number of usable participants for Exp2
# length(unique(dataExp2$workerId)) # 123

3.2 Dataset structure

The data is structured as follows:

  • workerId is the unique ID for each participant.
    • definition is each participant’s entered definition for each stimulus.
    • coding is the marking for each definition.
    • correct is binary: either the definition is correct (TRUE) or incorrect (FALSE).
    • confidence is each participant’s confidence rating for each definition. (on a scale from 0 to 5).
    • familiarity is the average rating for each Māori word obtained from 101 NMS New Zealanders in Experiment 1.
    • reactionTime is the reaction time for each rating (seconds).
    • length is the phoneme length of each stimulus.
    • word is the stimulus used for the rating.
    • speakMaori is each participant’s report of how well they can speak Māori (on a scale from 0 to 5).
    • compMaori is each participant’s report of how well they can understand/read Māori (on a scale from 0 to 5).
    • maoriProf is the sum of quantified response for speakMaori and compMaori (participant Māori proficiency).
    • age is the age group for each participant.
    • gender is the gender reported by each participant.
    • ethnicity is categorized into binary answers, either Māori (M) or non Māori (non M).
    • education is each participant’s highest level of education.
    • children is each participant’s report of whether they have had any children who have attended preschool or primary school in New Zealand in the past five years.
    • maoriList is each participant’s basic knowledge of Māori (with a scale ranging from 0 to 9).
    • place is each participant’s current place of living (3 levels: NZ North Island, NZ South Island, or Overseas).
    • duration is each participant’s time living in their current place (2 levels: long is > 2 years; short is =< 2 years).
    • firstLang is each participant’s first language.
    • firstLangCountry is the country where each participant learned their first language.
    • anyOtherLangs is any other languages each participant reports speaking.
    • hawaii is the binary response to the question whether participants have lived in Hawaii.
    • anyPolynesian is the binary response to the question whether participants know any Polynesian such as Hawaiian, Tahitian, Sāmoan, or Tongan.
    • whichPolynesian is the information regarding participants’ knowledge of Polynesian languages, if they know any.
    • impairments is the answer to the question whether participants have a history of any speech or language impairments.
    • maoriExpo is each participant’s level of exposure to Māori (with a scale ranging from 0 to 10).
    • scoreDictType is the dictionary phonotactic score normalized by the phonemic length of each stimulus.
    • scoreDictToken is the frequency-weighted dictionary phonotactic score normalized by the phonemic length of each stimulus.
    • scoreRsSeg is the segmented running speech phonotactic score normalized by the phonemic length of each stimulus.
    • n.neighbors is the the number of words (from the Māori dictionary) that can be reached by adding, deleting, or substituting one phoneme in each stimulus.
    • mean.neighbor.logfreq is the frequency-weighted phonological neighbourhood density.

3.3 Figure 2: Overview of participants’ sociolinguistic profile

Figure 2: Overview of participants’ sociolinguistic profile in Experiment 2. Bars are labeled with their counts for each category.

Figure 2: Overview of participants’ sociolinguistic profile in Experiment 2. Bars are labeled with their counts for each category.

3.4 Figure 6: Rate of accurate definitions per word

Figure 6: Rate of accurate definitions per word. Words are displayed according to their phonotactic score on the x-axis and their accuracy rates are represented on the y-axis. Phonotactic score is shown on the x-axis. Overlapping labels are not shown.

Figure 6: Rate of accurate definitions per word. Words are displayed according to their phonotactic score on the x-axis and their accuracy rates are represented on the y-axis. Phonotactic score is shown on the x-axis. Overlapping labels are not shown.

Figure 6: Rate of accurate definitions per word. Words are displayed according to their phonotactic score on the x-axis and their accuracy rates are represented on the y-axis. Phonotactic score is shown on the x-axis. Overlapping labels are not shown.

Figure 6: Rate of accurate definitions per word. Words are displayed according to their phonotactic score on the x-axis and their accuracy rates are represented on the y-axis. Phonotactic score is shown on the x-axis. Overlapping labels are not shown.

Figure 6: Rate of accurate definitions per word. Words are displayed according to their phonotactic score on the x-axis and their accuracy rates are represented on the y-axis. Phonotactic score is shown on the x-axis. Overlapping labels are not shown.

Figure 6: Rate of accurate definitions per word. Words are displayed according to their phonotactic score on the x-axis and their accuracy rates are represented on the y-axis. Phonotactic score is shown on the x-axis. Overlapping labels are not shown.

3.5 Modeling accuracy with a generalized linear mixed effects model

3.5.1 Table 2: Participants’ basic knowledge of Māori with running speech phonotactics

# Model for Table 2
# modelTable2 <- glmer(correct ~ c.(scoreRsSeg) + c.(familiarity) + c.(maoriList) + (1 + c.(scoreRsSeg) + c.(familiarity) |workerId) + (1+ c.(maoriList)|word), data=dataExp2, control=glmerControl(optimizer="bobyqa"), family=binomial(link="logit"))
# saveRDS(modelTable2, file = "modelTable2.rds")
modelTable2 <- readRDS("./modelTable2.rds")
kable(xtable(summary(modelTable2)$coef), digits=3, escape=F, full_width=T, caption="Table 2: Model summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.")
Table 2: Model summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.803 0.157 5.117 0.000
c.(scoreRsSeg) -1.569 0.716 -2.191 0.028
c.(familiarity) 5.563 0.454 12.258 0.000
c.(maoriList) 0.461 0.042 10.844 0.000

3.5.2 Figure 7: Effect plot from the model presented in Table 2

Figure 7: Effect plot of phonotactic score on accuracy.

Figure 7: Effect plot of phonotactic score on accuracy.

3.5.3 Table S5: Participants’ self-evaluated level of Māori proficiency with running speech phonotactics

# Model for Table S5 (Māori proficiency)
# modelTableS5 <- glmer(correct ~ c.(scoreRsSeg) + c.(familiarity) + c.(maoriProf) + (1 + c.(scoreRsSeg) + c.(familiarity) |workerId) + (1+ c.(maoriProf)|word), data=dataExp2, control=glmerControl(optimizer="bobyqa"), family=binomial(link="logit"))
# saveRDS(modelTableS5, file = "modelTableS5.rds")
modelTableS5 <- readRDS("./modelTableS5.rds")   
kable(xtable(summary(modelTableS5)$coef), digits=3, caption="Table S5 (Māori proficiency): Modeling summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.")
Table S5 (Māori proficiency): Modeling summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.742 0.169 4.388 0.000
c.(scoreRsSeg) -1.423 0.714 -1.992 0.046
c.(familiarity) 5.213 0.457 11.400 0.000
c.(maoriProf) 0.725 0.125 5.804 0.000

3.5.4 Table S6: Participants’ self-reported level of exposure to Māori with running speech phonotactics

# Model for Table S6 (Exposure to Māori)
# modelTableS6 <- glmer(correct ~ c.(scoreRsSeg) + c.(familiarity) + c.(maoriExpo) + (1 + c.(scoreRsSeg) + c.(familiarity) |workerId) + (1+ c.(maoriExpo)|word), data=dataExp2, control=glmerControl(optimizer="bobyqa"), family=binomial(link="logit")) 
# saveRDS(modelTableS6, file = "modelTableS6.rds")
modelTableS6 <- readRDS("./modelTableS6.rds")   
kable(xtable(summary(modelTableS6)$coef), digits=3, caption="Table S6 (Exposure to Māori): Modeling summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.")
Table S6 (Exposure to Māori): Modeling summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.743 0.165 4.503 0.000
c.(scoreRsSeg) -1.889 0.678 -2.786 0.005
c.(familiarity) 5.037 0.431 11.682 0.000
c.(maoriExpo) 0.388 0.052 7.397 0.000

3.5.5 Table S7: Participants’ basic knowledge of Māori with token-based phonotactics

# Model for Figure 8
# modelFigure8 <- glmer(correct ~ c.(scoreDictToken) + c.(n.neighbors) + c.(familiarity) + c.(maoriList) + (1 + c.(scoreDictToken) + c.(familiarity)|workerId) + (0 + c.(n.neighbors)|workerId) + (1 + c.(maoriList)|word), data=dataExp2, control=glmerControl(optimizer="bobyqa"), family=binomial(link="logit"))
# saveRDS(modelFigure8, file = "modelFigure8.rds")
modelFigure8 <- readRDS("./modelFigure8.rds")
kable(xtable(summary(modelFigure8)$coef), digits=3, caption="Table S7: Model summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.")
Table S7: Model summary for accuracy with a generalized linear mixed effects model. All numeric variables in this model are centered.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.816 0.158 5.148 0.00
c.(scoreDictToken) -1.019 0.708 -1.438 0.15
c.(n.neighbors) 0.027 0.014 1.964 0.05
c.(familiarity) 5.589 0.460 12.156 0.00
c.(maoriList) 0.468 0.043 10.804 0.00

3.5.6 Figure 8: Effect plot from the model presented in Table S7

Figure 8: Effect plot of neighbourhood density on accuracy.

Figure 8: Effect plot of neighbourhood density on accuracy.

4 Supporting Information

4.1 Appendix A: Post-questionnaire

  1. How well are you able to speak Māori?
    \(\square\) Very well (I can talk about almost anything in Māori)
    \(\square\) Well (I can talk about many things in Māori)
    \(\square\) Fairly well (I can talk about some things in Māori)
    \(\square\) Not very well (I can only talk about simple/basic things in Māori)
    \(\square\) No more than a few words or phrases
    \(\square\) Not at all

  2. How well are you able to understand/read Māori?
    \(\square\) Very well (I can understand almost anything said/written in Māori)
    \(\square\) Well (I can understand many things said/written in Māori)
    \(\square\) Fairly well (I can understand some things said/written in Māori
    \(\square\) Not very well (I can only understand simple/basic things said/written in Māori)
    \(\square\) No more than a few words or phrases
    \(\square\) Not at all

  3. Which age group do you belong to?
    \(\square\) 18 - 29
    \(\square\) 30 - 39
    \(\square\) 40 - 49
    \(\square\) 50 - 59
    \(\square\) +60

  4. Please state your gender:

  5. Please state your ethnicity:

  6. Your highest education is:
    \(\square\) High school
    \(\square\) Undergraduate degree
    \(\square\) Graduate degree

  7. How often do you think you are exposed to the Māori language in your daily life, by means of Māori radio, Māori TV, online media?
    \(\square\) Less than once a year
    \(\square\) Less than once a month
    \(\square\) Less than once a week
    \(\square\) Less than once a day
    \(\square\) Multiple times a day

  8. How often do you think you are exposed to Māori language in your daily life, in conversation at work, at home, in social settings?
    \(\square\) Less than once a year
    \(\square\) Less than once a month
    \(\square\) Less than once a week
    \(\square\) Less than once a day
    \(\square\) Multiple times a day

  9. In the past five years, have you had any children living with you who have attended preschool or primary school in New Zealand?
    \(\square\) Yes
    \(\square\) No

  10. Please tick all boxes that apply.
    \(\square\) I can give a mihi in Māori.
    \(\square\) I can sing a few songs in Māori.
    \(\square\) I can sing the NZ national anthem in Māori.
    \(\square\) I know how to say some basic phrases (e.g. My name is…, I’m from…) in Māori.
    \(\square\) I know how to say some commands (e.g. Sit down / Come here) in Māori.
    \(\square\) I know how to say some greetings in Māori.
    \(\square\) I know how to say some numbers in Māori.
    \(\square\) I know how to say some body parts in Māori.
    \(\square\) I know how to say some colors in Māori.

  11. What region of New Zealand do you live in currently? (Please choose ``overseas" if you are living outside of New Zealand.)
    \(\square\) Northland
    \(\square\) Auckland
    \(\square\) Waikato
    \(\square\) Bay of Plenty
    \(\square\) Gisborne
    \(\square\) Hawke’s Bay
    \(\square\) Taranaki
    \(\square\) Wanganui
    \(\square\) Manawatu
    \(\square\) Wairarapa
    \(\square\) Wellington
    \(\square\) Nelson Bays
    \(\square\) Marlborough
    \(\square\) West Coast
    \(\square\) Canterbury
    \(\square\) Timaru - Oamaru
    \(\square\) Otago
    \(\square\) Southland
    \(\square\) Overseas

  12. How long have you been living there?

  13. Please state your first language (the language you speak/use most of your time).

  14. What country were you living in when you first learned this language?

  15. Please list any other languages that you can speak fluently:

  16. Have you ever lived in Hawaii?
    \(\square\) Yes
    \(\square\) No

  17. Do you speak/understand any Polynesian languages such as Hawaiian, Tahitian, Sāmoan, or Tongan?
    \(\square\) Yes
    \(\square\) No

  18. If you replied yes to question 17, please state the language you know.

  19. Do you have a history of any speech or language impairments that you are aware of?
    \(\square\) Yes \(\square\) No

4.2 Appendix B: Stimulus materials for Experiments

4.2.1 List of stimuli for Experiment 1 - real words

realword <- dataExp1[dataExp1$type=="real",]
listword <- paste(unique(realword$word), collapse=", ")
kable(listword, caption="Table 4: List of stimuli for Experiment 1 - real words",col.names = NULL)
Table 4: List of stimuli for Experiment 1 - real words
karakia, tangi, pākehā, papa, haere mai, rua, manuhiri, rangatira, taringa, whāngai, māori, kaha, waru, karanga, reo, waewae, tupuna, mahi, ringaringa, whero, ora, iwi, papatūānuku, karu, motu, waha, taiaha, mauī, kahurangi, pokohiwi, awa, hoki, hope, atua, tekau, kaupapa, turituri, ako, toru, korowai, mokopuna, hongi, waiata, taihoa, kai moana, tiki, mihi, whakarongo, kākāriki, puke, koro, iti, rangatiratanga, whenua, waka, tāne, katoa, aoraki, moana, kaitiaki, wahine, kurī, mana, kura, taonga, poi, marae, kapa haka, tāngata, aroha, tēnā koe, ranginui, whanau, pounamu, iwa, ihu, mōrena, puku, mate, ono, hui, maunga, whare, tuakana, wāhi tapu, mere, wai, taniwha, hīkoi, hōhā, matariki, moko, tapu, whitu, hangi, nui, kōhanga, tangata whenua, māwhero, noho, tohunga, kāwanatanga, hapū, haka, wānanga, whaea, teina, kia ora, kuia, kōrero, aotearoa, pai, roto, utu, rima, kia kaha, koru, tamariki, pango, pōwhiri, tahi, kāinga, kai, rangi, whakapapa, upoko, kaiwhakahaere, tikanga, tēnā koutou, kōwhai, kaumātua, koha

4.2.2 List of stimuli for Experiment 1 - nonwords

pseudoword <- dataExp1[dataExp1$type=="non",]
listword <- paste(unique(pseudoword$word), collapse=", ")
kable(listword, caption="Table 5: List of stimuli for Experiment 1 - nonwords",col.names = NULL)
Table 5: List of stimuli for Experiment 1 - nonwords
māheneketoa, pukau nia, pikeko, pūrawha, titapa, huengi, pie, kepi, ape, hakaatū, ikau, rumo, tawhengawhi, mautāmu, nia, mihea, taetū, wereu, whani nia, tuwhe, pūno, mango, whenepōna, pūwhi, rahue, teu, kawaa, ahiahake, pūtio, eko, nia whihia, kūhatapō, ahatiati, moapi, kāweroni, tāmarutō, inga, mero, takamīa, howaka, hoihoko, temi, ngawhāniti, tūkeiati, whuri, ngapoto, teaori, rupo, whataī, wheu, tīahu, pahapā, kūro, hōke, nito, mōnga, nia pukau, tikaweneri, tārorangī, nia ire, hunge, moeo, tumeiroruare, mōha, komekua, mie, tākapī, poraki, kupō, kawha, ngue, pukau, taongirua, kōua, tie, amu, whani kawha, whani whani, ngoa, nitumaotaha, tītā, ngepa, kingiro, rūne, ngehi, māwi, hāno, hiu, pāwhi, tāhuma, eha, ingi, māorawau, kitō, arane, hewe, nōitia, hepiti, meahua, nia whani, nuti, paihoui, wikuruta, whihia nia, whaha, tetoua, rowa, nguta, pīhu, nue, wuri, rangu, tohiāhia, mamatōhī, mupati, tatūhe, ngae, ngaena, tikōha, apēhia, pīngi, humo, haeo, hingi, horetī, whāngaki, toketi, nia kawha, whihia, wehao, wheto, wura, puora, hupū, hiamu, uke, makei, uro, waemura, rungu, peu, nure, kōmuawhiu, pāuki, iko, rowhaohi, pāhāpāko, tetohe, whāhu, whengo, natoi, kemoramo, whani, kawha whani, tīkīhiki, nuhi, neetia, uti, pewe, reru, whani poraki, whakōiaweahua, ire, tīpe, kōioromāpara, kawha nia, nia uti, mungi, iniata, tapopa, rupa, hoengaima, rukō, ario, unati, pume, naipu, kawha kawha, nema, ngemetata, whutarirari, iru, whaiē, hūku, tikū, nopo, māorua, rume, tuanapū, pote, tīpo, paurounu, mini, ihiri, hepaua, whani titapa, nānga, kūwhati, rapeia, whehu, ino, ngema, tiwhi, uko, wawemiti, rapuko, titapa pukau, poraki pukau, touki

4.2.3 List of stimuli for Experiment 2

listword <- paste(unique(dataExp2$word), collapse=", ")
kable(listword, caption="Table 6: List of stimuli for Experiment 2",col.names = NULL)
Table 6: List of stimuli for Experiment 2
aoraki, aroha, atua, awa, whaea, whakapapa, whakarongo, whāngai, whanau, whare, whenua, whero, whitu, haere mai, haka, hangi, hapū, hīkoi, hōhā, hoki, hongi, hui, iti, iwa, iwi, kaha, kahurangi, kai, kai moana, kāinga, kaitiaki, kākāriki, kapa haka, karakia, karanga, katoa, kaumātua, kaupapa, kāwanatanga, kia kaha, kia ora, kōwhai, koha, kōhanga, kōrero, koro, korowai, koru, kuia, kura, kurī, māwhero, mahi, mana, manuhiri, marae, matariki, mauī, maunga, mere, mihi, moana, moko, mokopuna, mōrena, motu, noho, nui, ono, ora, pai, papa, papatūānuku, pōwhiri, poi, pounamu, puke, puku, rangatira, rangatiratanga, rangi, ranginui, reo, rima, ringaringa, roto, rua, tahi, taiaha, taihoa, tamariki, tāngata, tangata whenua, tāne, tangi, taniwha, taonga, tapu, taringa, teina, tekau, tēnā koe, tēnā koutou, tikanga, tiki, tohunga, toru, tuakana, tupuna, utu, waewae, wahine, wāhi tapu, wai, waiata, waka, wānanga, waru